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A Class of Prediction-Correction Methods for 
Time-Varying Convex Optimization 

Andrea Simonetto*, Aryan Mokhtari^^, Alec KoppeP, Geert Lens*, and Alejandro Ribeiro^^ 


Abstract — This paper considers unconstrained convex opti¬ 
mization problems with time-varying objective functions. We 
propose algorithms with a discrete time-sampling scheme to 
find and track the solution trajectory based on prediction and 
correction steps, while sampling the problem data at a constant 
rate of 1 /h, where h is the sampling period. The prediction step is 
derived by analyzing the iso-residual dynamics of the optimality 
conditions. The correction step adjusts for the distance between 
the current prediction and the optimizer at each time step, 
and consists either of one or multiple gradient steps or Newton 
steps, which respectively correspond to the gradient trajectory 
tracking (GTT) or Newton trajectory tracking (NTT) algorithms. 
Under suitable conditions, we establish that the asymptotic error 
incurred by both proposed methods behaves as 0{h^), and 
in some cases as 0{h‘^), which outperforms the state-of-the- 
art error bound of 0{h) for correction-only methods in the 
gradient-correction step. Moreover, when the characteristics of 
the objective function variation are not available, we propose 
approximate gradient and Newton tracking algorithms (AGT 
and ANT, respectively) that still attain these asymptotical error 
bounds. Numerical simulations demonstrate the practical utility 
of the proposed methods and that they improve upon existing 
techniques by several orders of magnitude. 

Index Terms — Time-varying optimization, non-stationary opti¬ 
mization, parametric programming, prediction-correction meth¬ 
ods. 

1. Introduction 

In this paper, we consider unconstrained optimization prob¬ 
lems whose objective functions vary continuously in time. In 
particular, consider a variable x e M" and a non-negative 
continuous time variable t e IR+, which determine the choice 
of a smooth strongly convex function f : M" x —> M. We 

study the problem 

:= argmin/(a:;f), for f > 0 . (1) 

Our goal is to determine the solution x* (t) of (1) for each time 
t which corresponds to the solution trajectory. Time-varying 
optimization problems of the form (1) arise in control [3]-[5], 
when, for instance, one is interested in generating a control 
action such that the system remains close to a dynamical 
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reference trajectory, as well as in signal processing [6], where 
one seeks to estimate a dynamical process based on time- 
varying observations. Other examples arise in robotics [7]-[l 1] 
and economics [12]. 

The problem in (1) can be solved based on a continuous 
time platform [13]-[16] or can be interpreted as a sequence 
of time-invariant problems. In particular, one could sample 
the objective functions f{x\ f) at time instants tk with k = 
0,1,2,..., and sampling period h = — ffc-i, arbitrarily 

close to each other and then solve the resulting time-invariant 
problems 

x*{tk) := argmin/(at;ffc). (2) 

xeR" 

By decreasing h, an arbitrary accuracy may be achieved 
when approximating (1) by (2). However, solving (2) for each 
sampling time tk is not a viable option in most application 
domains, even for moderate-size problems. The requisite com¬ 
putation time for solving each instance of the problem often 
does not meet the requirements for real-time applicability, 
as in the control domain [17]. It is also challenging to 
reasonably bound the time each problem instance will take 
to be solved [18]. In short, the majority of iterative methods 
for convex problems with static objectives may not be easily 
extended to handle time-varying objectives, with the exception 
of when the changes in the objective occur more slowly than 
the time necessary for computing the optimizer. 

Instead, we consider using the tools of non-stationary 
optimization [19]-[22] [23, Chapter 6] to solve problems of 
the form (1). In these works the authors consider perturba¬ 
tions of the time-varying problem when an initial solution 
x*{to) is known. More recently, the work presented in [24] 
designs a gradient method for unconstrained optimization 
problems using an arbitrary starting point, which achieves a 
\\x{tk) — x*{tk)\\ = 0{h) asymptotic error bound with respect 
to the optimal trajectory. Time-varying optimization has also 
been studied in the context of parametric programming, where 
the optimization problem is parametrized over a parameter 
vector p eM.P that may represent time, as studied in [25]-[27]. 
Tracking algorithms for optimization problems with parame¬ 
ters that change in time are given in [12], [28] and are based 
on predictor-corrector schemes. Even though these algorithms 
are applicable to constrained problems, they assume the access 
to an initial solution x*{to), which may not be available in 
practice. Some of the theoretical advances in these works have 
been used to ease the computational burden of sequential 
convex programming while solving nonconvex optimization 
problems, or nonlinear model predictive control [3], [29], [30]. 

In this paper, we design iterative discrete-time sampling 
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algorithms initialized at an arbitrary point Xq which converge 
asymptotically to the solution trajectory x* (t) up to an error 
bound which may be specified as arbitrarily small and de¬ 
pends on the sampling period h. In particular, the methods 
proposed here yield a sequence of approximate time-varying 
optimizers {xk}, for which limsupj,^oo ll^fe ~ ^ ^ 

with 6 dependent on the sampling period h. To do so, we 
predict where the optimal continuous-time trajectory will be 
at the next sampling time and track the associated prediction 
error based upon estimating the curvature of the solution 
trajectory. Under suitable assumptions, we establish that the 
proposed prediction-correction scheme attains an asymptotic 
error bound of 0{h?) and in some cases 0{h^), which 
outperforms the 0{h) error bound achieved by the state-of- 
the-art method of [24]. 

In Section II, we analyze unconstrained optimization prob¬ 
lems and we propose algorithms to track their time-varying 
solution which fall into the family of tracking algorithms with 
an arbitrary starting point.The proposed methods are based 
on a predictor-corrector approach, where the predictor step is 
generated via a Taylor expansion of the optimality conditions, 
and the correction step may either be a single or multiple 
gradient descent or Newton steps. In Section III, we show 
that our tracking methods converge to the solution trajectory 
asymptotically, with an error bound 0{h?) (and in some cases 
0{h'^) locally) dependent on the sampling period h. This error 
bound improves upon the existing methods which attain an 
0{h) bound. We further extend the tracking framework to 
account for the case where the dependence of the cost function 
on the time parameter is not known a priori but has to be 
estimated, and establish that the 0{h^) and the (local) 0{K^) 
asymptotical error bound are achieved despite the associated 
estimation uncertainty. In Section IV we numerically analyze 
the performance of the proposed methods as compared with 
existing approaches. In particular, in Section IV-A we consider 
a scalar example and show the convergence bounds hold in 
practice, and in Section IV-B we apply the proposed method to 
a reference path following problem and use the tools developed 
here to yield an effective control strategy for an intelligent 
system. Finally, in Section V we close the paper by concluding 
remarks. 

Notation. Vectors are written as at G M" and matrices as 
A G We use || • j| to denote the Euclidean norm, both 

in the case of vectors, matrices, and tensors. The gradient of 
the function f{x;t) with respect to x at the point {x,t) is 
indicated as Va;/(a;;f) G M”, while the partial derivative of 
the same function w.r.t. t at {x, t) is written as Wtf{x; t) G K. 
Similarly, the notation Vxxf{x',t) e denotes the Hes¬ 

sian of f{x;t) w.r.t. X at {x,t), whereas '^txf{x;t) G M" 
denotes the partial derivative of the gradient of f{x;t) w.r.t. 
the time t at {x,t), i.e. the mixed first-order partial derivative 
vector of the objective. The tensor Vxxxf{x]t) G 
indicates the third derivative of f{x;t) w.r.t. x at {x,t), the 
matrix Vxtxf{x\t) = Vtxxf{x-,t) G indicates the 

time derivative of the Hessian of /(at; t) w.r.t. the time t at 
{x,t), and the vector ttxf{x;t) G M" indicates the second 
derivative in time of the gradient of f{x;t) w.r.t. the time t 
at {x,t). 


H. Algorithm definition 

In this section we introduce a class of algorithms for solving 
optimization problem (1) using prediction and correction steps. 
In order to converge to the solution trajectory x*{t), we 
generate a sequence of near optimal decision variables {x^} 
by taking into account both how the solution changes in time 
and how different our current update is from the optimizer at 
each time step. 

A. Gradient trajectory tracking 

In this paper we assume that the initial decision variable 
Xq is not necessarily the optimal solution of the initial 
objective function f{x;to), i.e., tco A x*{to). We model 
this assumption by defining a residual error for the gradient 
of the initial variable 'Vxf{xo;to) = t'(O). To improve the 
estimation for the decision variable x, we set up a prediction- 
correction scheme motivated by the Kalman filter strategy 
in estimation theory [31] and by continuation methods in 
numerical analysis [32]. In the first step, we predict how the 
solution changes, and in the correction step we use descent 
methods to push the predicted variable towards the optimizer 
at that time instance/ 

To generate the prediction step, we reformulate the time- 
varying problem (1) in terms of its optimality conditions. 
Minimizing the objective in (1) is equivalent to computing 
the solution of the following nonlinear system of equations 

Vxf{x*{ty,t) = 0, (3) 

for each t. These two problems are equivalent since the 
objective functions f{x;t) are strongly convex with respect 
to X and only their optimal solutions satisfy the condition 
in (3). 

Consider an arbitrary vector x G M" which may be in¬ 
terpreted as the state of a dynamical system. The objective 
function gradient Wxf{x;t) G M" computed at point x is 

'^xf{x]t) = r{t), (4) 

where r{t) G M” is the residual error. The aim of the prediction 
step is to keep the residual error as constant as possible while 
the optimization problem is changing. To say it in another way, 
we want to predict how to update Xk such that we stay close to 
the iso-residual manifold. We try to keep the evolution of the 
trajectory close to the residual vector r(f) which is equivalent 
to 

'^xfix + Sx; t + St) ^ 

Vx/(tc; t) + VXX fix; t)6x + Vtxfix; t)St = r(f), (5) 

where Sx G M" and the positive scalar St are the variations of 
the decision variable x and the time variable t, respectively. 
By subtracting (4) from (5) and dividing the resulting equation 
by the time variation St, we obtain the continuous dynamical 
system 

X = -[Vxxfix; t)]~^Vtxfix; t), (6) 

^This correction strategy has been called differently by different authors: 
an alternative term is adaptation, as reported in 133], 134]. 
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Algorithm 1 Gradient trajectory tracking (GTT) 

Require: Initial variable xg. Initial objective function f{x-,tg), no. 
of correction steps r 
1: for fc = 0, 1 , 2, ... do 

2: Predict the solution using the prior information [cf (7)] 

^k + l\k ~ ^k h [V XX f (^Xk ^ ^ tx f (^^k ^ ik') 

3: Acquire the updated function f{x;tk+i) 

4: Initialize the sequence of corrected variables x^^i = Xk+i\k 

5: for s = 0 : r — 1 do 

6: Correct the variable by the gradient step [cf (8)] 

+ l “ ^k + l 'y^xf{^k + l]ik+l) 

7: end for 

8: Set the corrected variable Xk+i = ®fc+i 

9: end for 


where x := 6x/St. We then consider the discrete time approx¬ 
imation of (6), which amounts to sampling the problem at 
times tk, for fc = 0,1,2,... . The prediction step consists 
of a discrete-time approximation of integrating (6) by using 
an Euler scheme. Let Xk+i\k e M" be the predicted decision 
variable based on the available information up to time tk, then 
we may write the Euler integral approximation of (6) as 

^k+l\k ^k h [V XX f k T ik^^ ^tx f (j^k^^k^ • (7) 

Observe that the prediction step in (7) is computed by only 
incorporating information available at time however, the 
decision variable Xk+i\k is supposed to be close to the iso¬ 
residual manifold of the objective function at time tk+i- 
The gradient trajectory tracking (GTT) algorithm uses the 
gradient descent method to correct the predicted decision vari¬ 
able Xk+i\k- This procedure modifies the predicted variable 
Xk+x\k towards the optimal argument of the objective function 
at time ffc+i. Therefore, the correction (or adaptation) step of 
GTT requires execution of the gradient descent method based 
on the updated objective function f{x;tk+i). Depending on 
the sampling period h, we can afford a specific number of 
gradient descent steps until sampling the next function. 

Define r as the number of gradient descent steps used 
for correcting the predicted decision variable Xk+i\k- Eurther, 
define e K" as the corrected decision variable after ex¬ 
ecuting s steps of the gradient descent method. Therefore, the 
sequence of variables is initialized by = Xk+i\k 

and updated by the recursion 

= K+i - 7Vcc/(*fc+i; ffc-Hi), (8) 

where 7 > 0 is the stepsize. The output of the recursive update 
(8) after r steps is the decision variable of the GTT algorithm 
at time tu+i, i.e., x{tk+i) ■= x^+i = ®fc+i- 

We summarize the GTT scheme in Algorithm 1. Observe 
that Step 2 and Step 6 implement the prediction-correction 
scheme. In Step 2, we compute a first-order approximation of 
the gradient Va;/(tc;f) at time tk [cf. (7)]. Then we correct 
the predicted solution by executing r gradient descent steps as 
stated in (8) for the updated objective function f{x;tk+i) in 
Steps 5-7. The sequence of corrected variables is initialized by 
the predicted solution = Xk+i\k in Step 4 and the output 
of the recursion is considered as the updated variable Xk+i = 
xl,_^^ in Step 8. The implementation of gradient descent for 


the correction process requires access to the updated function 
f{x-,tk+i) which is sampled in Step 3. 

Note that the GTT correction step is done by executing t 
gradient descent steps which only uses first-order information 
of the objective function /. We accelerate this procedure using 
second-order information in the following subsection. 

B. Newton trajectory tracking 

The GTT prediction step introduced in (7) requires compu¬ 
tation of the partial Hessian inverse xxl{xk]tky\~^■ Note 
that the computational complexity of the Hessian inverse is 
of order O(n^), which is affordable when n is of moderate 
size or a certain level of latency associated with this inverse 
computation will not degrade performance. These two obser¬ 
vations justify using the Newton method for the correction (or 
adaptation) step as well, which requires computation of the 
partial Hessian inverse of the objective function. Therefore, 
we introduce the Newton trajectory tracking (NTT) method as 
an algorithm that uses second-order information for both the 
prediction and correction steps. 

The prediction step of the NTT algorithm is identical to 
the prediction step of the GTT method as introduced in (7); 
however, in the correction steps NTT updates the predicted 
solution trajectory by applying t steps of the Newton method. 
In particular, the predicted variable Xk+i\k in (7) is used 
for initializing the sequence of corrected variables i.e., 

:= £Cfc+i|fc. The sequence of corrected variables Xk+i 
updated using Newton steps as 

®fc+l ~ ^ xf{Xk^i\tk+l)- (9) 

The decision variable (solution) at step tk+i for the NTT 
algorithm x{tk+i) '■= Xk+i is the outcome of r iterations 
of (9) such that a:fc-i-i = 

Observe that the computational time of the Newton step 
and the gradient descent step are different. The complexity of 
the Newton step is in the order of O(n^), while the gradient 
descent step requires a computational complexity of order 
0{n). Since the sampling period is a fixed value, the number 
of Newton iterations in one iteration of the NTT algorithm 
is smaller than the number of gradient descent steps that 
we can afford in the correction step of GTT. On the other 
hand, the Newton method requires less iterations relative to the 
gradient descent method to achieve a comparable accuracy. In 
particular, for an optimization problem with a large condition 
number the difference between the convergence speeds of 
these algorithms is substantial, in which case NTT is prefer¬ 
able to GTT. 

In developing the prediction steps of the GTT and NTT 
algorithms we assumed that the mixed partial derivative 

txf{x]t) is available; however, frequently in applications 
the variation of the objective function over time is not known. 
This motivates the idea of approximating the objective function 
variation which we study in the following subsection. 

C. Time derivative approximation 

Consider the mixed partial derivative at time tk using the 
gradient of the objective with respect to x at times tk and 
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Algorithm 2 Newton trajectory tracking (NTT) 

Require: Initial variable xg. Initial objective function f{x-,tg), no. 
of correction steps r 
1: for fc = 0, 1 , 2, ... do 

2: Predict the solution using the prior information [cf (7)] 

^k+l\k ~ ^k h [V XX f (^X k ^ ^ tx f k ^ tk') 

3: Acquire the updated function f{x;tk+i) 

4: Initialize the sequence of corrected variables x^^i = Xk+i\k 

5: for s = 0 : r — 1 do 

6: Correct the variable by the Newton step [cf (8)] 

^k+l ~ ^k+\ ^XX f i^^k + l'i^k+l) ^X f i^Xk-kl] ik+l) 

7: end for 

8: Set the corrected variable Xk+i = x^+i 

9: end for 


tk-i, that is, the approximate partial mixed gradient Vtxfk as 

^ txf ~ "T xf i^k'-, ik^ ^ xf {,^k\^k—iy) ■ (10) 

h 

which is called a first-order backward finite difference since 
it requires information of the first previous step for approxi¬ 
mating the current mixed partial derivative. The error of this 
approximation is bounded on the order of 0{h) [35], which 
may be improved by using the gradients and mixed partial 
derivative '^txf{xk',tk) of more than one previous step, if 
needed^. 

Substituting the partial mixed gradient Vtxf{xk\tk) in 
(7) by its approximation Vtxf{xk]tk) in (10) leads to the 
approximate prediction step 

Xk + l\k Xk h [V XX f ^ txf i^Xkffk^ • (11) 

The predicted variable Xk+i\k is an initial estimate for the 
optimal solution of the objective function f{x;tk+i)- This 
estimation can be corrected by descending through the optimal 
argument of the objective function f{x;tk+i)- To do so, one 
may either use a gradient algorithm as in (8) or Newton steps 
as in (9). Based on this idea, we introduce the approximate gra¬ 
dient tracking (AGT) algorithm which is different from GTT 
in using the approximate prediction step in (11) instead of the 
exact update in (7). Likewise, we introduce the approximate 
Newton tracking (ANT) method as a variation of the NTT 
algorithm. We summarize the AGT and ANT methods which 
make use of this approximation scheme in Algorithms 3 and 
4, respectively. As we can observe, the main difference with 
Algorithms 1 and 2 is in Step 2, where we use the approximate 
time derivative. In Section III we establish that this time 
derivative approximation does not degrade significantly the 
performance of the algorithms presented here. 

III. Convergence Analysis 

We turn to establishing that the prediction-correction 
schemes derived in Section II solve the continuous-time prob¬ 
lem stated in (I) up to an error term which is dependent on 
the discrete-time sampling period. In order to do so, some 
technical conditions are required which we state below. 

^Approximation errors of the order of 0{hP'), 0{h^), and 0{h'^) can be 
achieved, e.g., by the recursive method presented in [36]. 


Algorithm 3 Approximate gradient tracking (AGT) 

Require: Initial variable xg. Initial objective function f{x-,tg), no. 
of correction steps r 
1: for fc = 0, 1 , 2,. .. do 

2: Predict the solution using the prior information [cf. (7)-(10)] 

Xk+\\k ~ Xk [V XX f (ut/cjf/c)] ^ txf(^Xk]tk^ h 

3: Acquire the updated function f{x;tk+i) 

4: Initialize the sequence of corrected variables x^^i = Xk+\\k 

5: for s = 0 : r — 1 do 

6: Correct the variable by the gradient step [cf. (8)] 

7: end for 

8: Set the corrected variable Xk+i = 

9: end for 


Algorithm 4 Approximate Newton tracking (ANT) 

Require: Initial variable xg. Initial objective function f{x-,tg), no. 
of correction steps r 
1: for fc = 0, 1 , 2, ... do 

2: Predict the solution using the prior information [cf. (7)-(10)] 

Xk+l\k ~ Xk [V XX f (^X k ^ V tx f (^Xk ^tk') h 

3: Acquire the updated function f{x;tk+i) 

4: Initialize the sequence of corrected variables = Xk+i\k 

5: for s = 0 : r — 1 do 

6: Correct the variable by the Newton step [cf. (8)] 

Xk-kl ~ Xk-kl ^ XX f {Xk-kl', tk + l') ^ X f {Xk+l’-j tk + l') 

7: end for 

8: Set the corrected variable Xk+i = il+i 

9: end for 


Assumption 1: The function /(at; t) is twice differentiable 
and m-strongly convex in a; e M" and uniformly in t, that is, 
the Hessian of /(at; t) with respect to x is bounded below by 
m for each x e M” and uniformly in t, 

^xxf{x]t) > ml, Va:GlR",f. 

Assumption 2: The function f{x;t) is sufficiently smooth 
both in a; G M" and in t, and in particular, /(a;; t) has bounded 
second and third order derivatives with respect to a; G M" and 
t as 

||V,,/(a:;f)|| ^ L, i|Vi,/(a:;f)|| ^ Co, ||f)|| ^ Ci, 
||V,„*,„/(a:;f)|| ^ C 2 , ||V„,,/(a:;f)|| ^ C 3 . 

Assumption 1, besides guaranteeing that problem (1) is 
strongly convex and has a unique solution for each time 
instance, is needed to ensure that the Hessian of the objective 
function /(at; t) is invertible. The fact that the solution is 
unique for each time instance, implies that the solution trajec¬ 
tory is unique. This mathematical setting frequently appears in 
the analysis of optimization tools in time-varying settings, and 
is essential to establishing trajectory tracking results- see, for 
instance [6], [12], [24], [37]. Assumption 2 ensures that the 
Hessian is bounded from above, a property which is equivalent 
to the Lipschitz continuity of the gradient, and that the third 
derivative tensor '^xxxf{x; t) is also bounded above (typically 
required for the analysis of Newton-type algorithms), as well 
as boundedness of the time variations of gradient and Hessian. 
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These last properties ensure the possibility to build a prediction 
scheme based on the (estimated) knowledge of how the func¬ 
tion and its derivatives change in time. A similar assumption 
was required (albeit only locally) for the local convergence 
analysis in [12, Eq. (3.2)]. 

Assumptions 1 and 2 are sufficient to show that the solution 
mapping t x*{t) is single-valued and locally Lipschitz 
continuous in t, and in particular, 

\\x*{tk+i)-x*{tk)\\ — Wtxf{x;t)\\{tk+i-tk) < 

m 

see for example [26, Theorem 2F.10]. This gives us a link 
between the sampling period h and the allowed variations 
in the optimizers. This property also allows our algorithms 
to converge to a neighborhood of the optimal solution. We 
remark that, in most of the current literature the condition 
in ( 12 ) is taken as an assumption (that is, one assumes that 
the optimizer does not change more than a certain upper bound 
in time), while here is a consequence of our smoothness and 
boundedness assumptions. 

Remark 1: Assumptions 1 and 2 can be weakened if a 
priori knowledge of the domain of the the optimizers and the 
sequence generated by the algorithms is given by the structure 
of the problem, i.e. the optimal trajectory is contained within a 
subset X of M". In this case, we can concentrate on functions 
that verify Assumptions 1 and 2 only for x e X cz M". We 
explore this scenario in the second numerical example. An 
alternative setting in which Assumptions 1 and 2 need not 
hold is if we restrict (project) the algorithms to a neighborhood 
of the optimal trajectory. In this latter case, the convergence 
analysis becomes local only. 

We start the convergence analysis by deriving an upper 
bound on the norm of the approximation error e K" of the 
first-order forward Euler integral in (7) (w.r.t. the continuous 
dynamics ( 6 )). This error is sometimes referred to as the local 
truncation error [35]. The error is defined as the difference 
between the predicted Xk+i\k in (7) and the exact prediction 
x{tk+i) obtained by integrating the continuous dynamics ( 6 ) 
from the same initial condition Xk, i.e.. 


• Xk-kl\k (13) 


The upper bound for the norm [|.^fc|| is central in all our algo¬ 
rithms, since it encodes the error coming from the prediction 
step. We study this upper bound in the following proposition. 

Proposition 1: Under Assumptions 1-2, the error norm 
||.^fe|| of the Euler approximation (7) defined in (13) is upper 
bounded by 


ll^fcll < 


2 


CgCi ^ 2C0C2 ^ C3 

vn? m 


0{h^). (14) 


Proof: See Appendix A. ■ 

Proposition 1 states that the norm of the discretization error 
||.^fe|| is bounded above by a constant which is in the order 
of 0{h?). We use this upper bound in proving convergence of 
all the proposed methods. 


A. Gradient trajectory tracking convergence 

We study the convergence properties of the sequence of 
variables Xk generated by GTT for different choices of the 
stepsize. In the following theorem we show that the optimality 
gap \\xk — at*(ffc)|| converges exponentially to an error bound. 

Theorem 1: Consider the gradient trajectory tracking algo¬ 
rithm as defined in (3)-(8). Let Assumptions 1-2 hold true and 
define the constants p and a as 


p := max{|l— 7 m|, | 1 — 7 LI}, a := l + hiCQCi/rrf + 02 / 0 : 1 ). 

(15) 

Let the stepsize 7 be chosen as 0 < 7 < 2/L, which implies 
p<l. 

i) For any sampling period h, the sequence {xk} converges 
to x*{tk) exponentially up to a bounded error as 


ll^fc - x*{tk)\\ < - ®*(io)ll 


+ P 


h 

to 

0^ 

+ y 

\ClCi , 2C0C2 , C3] 


1 

1 L 

[ m J 

h 

CM 

h 

CO 


T 


(16) 

r.rk 


l-p- 


ii) If the sampling period h is chosen such that p'^a < 1, 
CoCi , C 2 ’ 


h < 




H- 

m 


- 1 ) 


(17) 


then the sequence {xk} converges to x* (fk) exponentially 
up to a bounded error as 

\\xk - x*{tk)\\ ^ {p'"a)'^\\xQ - a;*(fo)|| (18) 

rh^ 

+ ^ y 


\ClCi , 2C0C2 , C3] 

n-(pv)n 

mo jji 

1 — y cr 


Proof: See Appendix B. ■ 

Theorem 1 states the convergence properties of the GTT 
algorithm for different choices of the parameters. In both 
cases the exponential convergence to a neighborhood is shown, 
however, the accuracy of convergence depends on the choice 
of the sampling period h, the stepsize parameter 7 , and the 
number of gradient descent steps r. To guarantee that the 
constant p is strictly smaller than 1 , the stepsize must satisfy 
7 < 2/L ; this can be seen by the definition of p and the 
fact that TO ^ L by Assumptions 1 and 2. Then, for any 
choice of the sampling period h the result in (16) holds, 
which implies exponential convergence to a neighborhood of 
the optimal solution. In this case the error bound contains 
two terms that are proportional to h and hf. Therefore, we 
can say that the accuracy of convergence is in the order of 
Ofh). Notice that increasing the number of gradient descent 
iterations r improves the speed of exponential convergence 
by decreasing the factor p'^. Moreover, a larger choice of t 
leads to a better accuracy since the asymptotic error bound is 
proportional to p'^/(l — p'^). 

The result in (18) shows that the accuracy of convergence 
is proportional to the square of the sampling period h, if the 
sampling period is chosen to satisfy the condition p'^a < 1 . 
In the following corollary we formalize this observation by 
studying the asymptotic convergence results of GTT for dif¬ 
ferent choices of stepsize. 

Corollary 1: Under the same conditions of Theorem 1, the 
sequence of variables {xk} generated by GTT converges to a 
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neighborhood of x'^{tk) asymptotically. The error bound when 
the parameters p and a in (15) are chosen as p^a ^ 1, < 1 

is 

limsup \\x{tk) - x*{tk)\\ ^ 
fc^oo to(1 - P j 

and if they satisfy a < 1 the error bound is 


limsup \\x{tk) - x*{tk)\\ 

k—*-<X) 

^ r C^Ci ^ 2C0C2 ^ 

2(1 — p^a) \ w? w? m ) 


0 {h?). (20) 


The asymptotic results in Corollary 1 are implied by con¬ 
sidering the results in Theorem 1 when k ^ co. Notice 
that when the stepsize satisfies conditions p'^a ^ l,p^ < 1 
the convergence accuracy of GTT is in the order of 0{h). 
Moreover, if the sampling period h is chosen such that 
p'^cr < 1 then the error bound is in the order of 0 {h?). 


B. Newton trajectory tracking convergence 


Notice that the GTT algorithm does not incorporate the 
second-order information of the objective function f{x;tk+i) 
to correct the predicted variable while the NTT 

algorithm uses Newton’s method in the correction step. Similar 
to the advantages of Newton’s method relative to the gradient 
descent algorithm, we expect to observe faster convergence 
and more accurate estimation for NTT relative to GTT. In 
particular, one would expect that if Newton’s method is in 
its quadratic phase, the error should be at least in the order 
of In the following theorem we show that when both 

the initial estimate tco is close enough to the initial solution 
x*{to) and the sampling period h is chosen properly, then 
NTT yields a more accurate convergence relative to GTT. 


Theorem 2: Consider the NTT algorithm generated by (7) 
and (9). Assume that all the conditions in Assumptions 1-2 
hold. Define constants 61 , 62 and Q as 


^ m ’ ^ 2 m^ rn? 2 m ’ Ci 

( 21 ) 

Further, recall r as the number of Newton steps in the 
correction step. For any constant c > 0, if the sampling period 
h satisfies 


h ^ min 


r 

’[((1 + < 5 i)c + < 52 ) 



( 22 ) 


and the initial error ||xo ~ 3:*(fo)|| satisfies the condition 


||a;o - at*(fo)|| < c/i^, (23) 


then the sequence ||£Cfc — x*{tk)\\ generated by NTT for A: > 1 
is bounded above as 


\\xk - x*{tk)\\ Q {ac + 62 )'^'^■ (24) 


Proof: See Appendix C. ■ 

Theorem 2 establishes that, under additional conditions, the 
NTT tracks the optimal trajectory x*{tk) up to an error bound 
not larger than 

+ 62 )^^ (25) 


where h is the sampling period. This is a result of the quadratic 
convergence of Newton’s method. 

The conditions can be intuitively explained as follows. 
Condition (23) formalizes the local nature of the convergence 
analysis of Theorem 2: due to the dependence of (22) on 
c, the right-hand side of (23) is in fact upper bounded. For 
example, when c ^ 00 , then h ^ 0 and c/i^ —> Q/{1 + ^i). 
We notice that the initial gap is proportional to since the 
integration error ||.^|| has the same dependence on h. Finally, 
(22) derives an upper bound on the allowable sampling period. 
It comprises of two terms, the first coming from the need 
for a local analysis, the second from convergence arguments. 
Despite the fact that Theorem 2 is a local convergence result, 
in the numerical simulations we will display how NTT behaves 
very well even in a global sense, and for r = 1 achieves the 
proven 0 {hf) error bound. 

Remark 2: (Quadratic functions and backtracking) Condi¬ 
tions (23) is a locality requirement, which is rather typical in 
for the analysis of Newton methods. The closer the function 
f{x\ f) is to be quadratic, the smaller the parameter Ci is. 
When the function is quadratic, then Ci = 0, which in turns 
means Q,chf co, i.e., global convergence is achieved (as 
expected). When Ci becomes important, then one can think of 
initializing the Newton method with a backtracking strategy 
(as done often in practice), see [18]. 

Remark 3: (Hybrid strategy) Theorem 2 suggests also a 
warm start procedure to implement the NTT algorithm. In 
particular, consider the condition ||a:o — a;*(fo)|| < chf. 
Given the strong convexity assumption and the fact that the 
gradient vanishes at optimality, this condition is implied by 
the following sufficient condition 

Wxf{xo]to)\\ ^ TOc/i^, (26) 


which is easier to check in practice than condition (23) (since 
normally one does not have access to the optimizer a:*(fo))- 
In fact, one might implement a hybrid strategy, where at the 
beginning we run the GTT algorithm and then we switch to 
NTT when the condition in (26) is satisfied. In order to make 
sure that the GTT algorithm eventually arrives at an error 
\\xk — a;*(ffc)|| ^ c/i^, we need to pick c in a way that ch^ 
is strictly bigger than the asymptotical error of GTT in (20). 
Therefore, we must choose c as 


P"'h 

1 — p'^a 


(27) 


Hence, start with GTT and choose a sampling period h 
that verifies (22) and switch to NTT when condition (26) 
is satisfied. We will see how this strategy performs in the 
simulation results. 


C. Convergence of methods with approximated time derivative 

We focus now on the approximated version of GTT and 
NTT (i.e., the AGT and ANT algorithms), where we approx¬ 
imate the time derivative of the gradient. In the following 
theorems, we formalize the fact that this approximation does 
not affect the order of the asymptotic error w.r.t. h. 

Theorem 3: Consider the AGT algorithm as defined in 
Algorithm 3, recall the definitions of the constants p and a 
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in (15), and let Assumptions 1-2 hold true. Let the stepsize 7 
be chosen as 0 < 7 < 2 /L, which implies p < 1 . 
i) For any sampling period h, the sequence {xf.} converges 
to x*{tk) exponentially up to a bounded error as 


ll^fe - x*{tk)\\ < - £C*(to)|| 


(28) 

ru'^ m HI ^ — p'^ 

ii) If the sampling period h is chosen such that p^a < 1, 
i.e., 


h < 


CqCi ^ C2 


-1 


(p-’ - 1 )- 


(29) 


m 

then the sequence {xk} converges to x* (tk) exponentially 
up to a bounded error as, 


\\xk - x*{tk)\\ < {p'^a)’^\\xo - x*{to)\\ (30) 


\ClCi , 2 C 0 C 2 , 2(73] 


h 

IM 

h 

1 — p'^a 


Proof: See Appendix D. ■ 

Theorem 3 states the convergence properties of the AGT 
algorithm for different choices of the parameters. In both cases 
the exponential convergence to a neighborhood is shown with 
convergence accuracy depending on the sampling period h, 
the stepsize 7 , and the number of gradient descent steps r. 
Moreover, for particular sampling period selections depending 
on smoothness properties of the objective, the asymptotic 
error bound either converges up to an 0 {h) or 0 {hf) term. 
Notice that the convergence properties of AGT in (28) and 
(30) are identical to the convergence results of GTT in (16) 
and (18), respectively, except for the coefficients of hf. To 
be more precise, the coefficient of in (28) and (30) is 
C^Cxj^vr? -I- Ci^C^Irr? + while the coefficient of Yf 

in (16) and (18) is C^Cxj^rr? + C'oC 2 /to^ -I- This 

observation implies that the error bound of AGT is slightly 
larger than the error of GTT which is implied by the error of 
the derivative approximation. However, the orders of the error 
bounds for these two algorithms are identical. 

AGT uses only first-order information of the objective 
f{x;tk+i) to correct the predicted variable Xk+i\k, while 
ANT uses the Newton method in the correction step. Similar 
to the advantages of NTT relative to GTT, we show more 
accurate estimation for ANT relative to AGT in the following 
theorem. 


Theorem 4: Consider the ANT algorithm as defined in 
Algorithm 4, recall the definitions of the constants (5i, 62 and 
Q as in (21), and let Assumptions 1-2 hold true. Further, recall 
T as the number of Newton steps in the correction step and 
define 62 as 

4:.4-8g. (31) 

For any constant c > 0, if the sampling period h satisfies 


h ^ min 


^’[((l-f,5i)c + ,5')2- 



(32) 


and the initial error jjato ~ a:*(fo)|| satisfies the condition 

lltco - tc*(fo)|| < c/i^, (33) 


then the sequence \xk — x*{tk) \ generated by ANT for fc ^ 1 
is bounded above as 

\\xk-x*{tk)\\ ^ + 5 ' 2 f''. (34) 


Proof: See Appendix E. ■ 

Theorem 4 states that the ANT algorithm reaches an esti¬ 
mation error of order 0(/i^'^). Observe that the error bound 
in (34) for ANT is slightly worse than the bound in (24) for 
NTT, since 6'2 > ^ 2 . On the other hand, the bound for both 
algorithms is in the order of Oip'^). According to the results 
in Theorems 3 and 4, we can approximate the time derivative 
simply by a first-order scheme without changing the functional 
dependence of the error in h, but increasing its magnitude. In 
the simulation results, we show that this increase in error is 
in fact extremely limited. These analytical results therefore 
suggest the advantage of the proposed prediction-correction 
algorithms even in cases in which the knowledge of the time 
variability of the objective function is only estimated, which 
is important in many practical scenarios, e.g., in robotics or 
in statistical signal processing. 

IV. Numerical Experiments 

In this section, we implement the algorithms derived in 
Section II for a couple practical examples in order to asses 
their performance in practice. Specifically, in Section IV-A, we 
consider a simple time-varying function and apply the GTT, 
NTT, AGT, and the hybrid method of Remark 3. Additionally, 
in Section IV-B, we consider the task of designing a derivative 
control law for an autonomous system to follow a reference 
path. In this practical setting, we only consider the case where 
the time-derivative of the objective is not available, and hence 
must be approximated. Here this approximation corresponds 
to not having perfect information regarding the reference path 
the system aims to track. 

A. Scalar example 

As a simple example, consider the case where the decision 
variable a; G M is a scalar and the time-varying optimization 
problem is 

1 2 

min/(a;;f) := - (a; — cos(wf)) + k log[l H-exp(/ax)]. (35) 

The function in (35) represents, for instance, the goal of 
staying close to a periodically varying trajectory plus a logistic 
term that penalizes large values of x. The terms uj, k, and p 
are arbitrary nonnegative scalar parameters. In our experiments 
these parameters are set to la; = 0.02 tt, k = 7.5, and p = 1.75. 
The function /(a;; f) satisfies all the conditions in Assumptions 
1 and 2. In particular, one can compute in close-form the 
quantities 


^ XX f (^7 

1 1 2 exp(urc) 

- 1 + Kp [i+exp(^a))]2 > 

(36a) 

^ txf (^7 

= oj sin(cuf), 

(36b) 

V XXX f 

_ „.,3 exp(/ia))[l-exp(/ia))] 

- up [l+exp(,ia;)]3 ’ 

(36c) 

V xtxf 

= 0 . 

(36d) 

^ ttxf (^7 

= oj'^ cos{ujt), 

(36e) 
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Fig. 1. Error with respect to the sampling time for different algorithms 
applied to the scalar problem (35), with h = 0.1, k, = 7.5, /i = 1.5. 


Fig. 3. En'or with respect to the sampling time for different algorithms 
applied to the scalar problem (35), with h = 0.1, k, = .1, fi = .5 



Fig. 2. Worst case eiTor floor with respect to the sampling time interval h 
for different algorithms applied to the scalar problem (35), k, = 7.5, fi = 1.5. 

and the bounds 

m = min Vxxfix; t) = 1, (37a) 

L = max V^^/(x;t) = 1+^^ = 6.7422, (37b) 

x€R,t ^ 

Cq = max's/ tx fix ;t) = oj = 0.0628, (37c) 

Cl = max ^ XXX fix; t) = = 3-8678, (37d) 

xeR^t L3+V3J'^ 

C 2 = maxVxtxfix;t) = 0, (37e) 

xeR,t 

C 3 = max Vttxfix; t)=uf^ = 0.0039. (37f) 

X^R,t 

We choose the constant stepsize as 7 = 0.2 < 2/L in the 
gradient method stated in (8) and initialize ccq = 0 for all 
the algorithms. According to (17) the sampling period that 
guarantees an Oih?) error bound needs to be chosen as h < 
1.028. for all t ^ 1. 

In Figure 1, we plot the en'or \xk — x*(f/c)|| versus the 
discrete time tk for a sampling period of h = 0.1, for different 
schemes, along with the asymptotical bounds computed via 
Theorems 1 and 3. Observe that the running gradient (RG) 



Fig. 4. Worst case error floor with respect to the sampling time interval h 
for different algorithms applied to the scalar problem (35), k, = .1, fi = .5. 

method [24] which uses only a gradient correction step (and 
no prediction) performs the worst, achieving an error of 10 “^, 
while GTT for t = 1, t = 3, and r = 5 achieves an error 
of approximately 10“^. Numerically we may conclude that 
tracking with gradient-based prediction (GTT) for different 
values of r has a better error performance than running, even 
in the case we use an approximate time derivative (AGT); in 
addition, tracking with Newton-based prediction (NTT) with 
T = 1 achieves a superior performance compared to the others, 
i.e., an error stabilizing near 10 “^° is achieved. 

In Figure I, we also display the behavior of the hybrid 
strategy advocated in Remark 3. We can see how after we 
switch to NTT (when the condition ||Vx/(£Cfe;ffe)|| < 0.0034, 
derived from (27), is met), then in only one step we regain 
the same performance as NTT. 

The differences in performance can be also appreciated by 
varying h and observing the worst case error floor size which 
is defined as maxi.^j^{\\xk — x*(<fc)||}, where k = 10 ^ in 
the simulations. Figure 2 illustrates the error as a function 
of h. The performance differences between the proposed 
methods that may be observed here corroborate the differences 
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Fig. 5. Sample trajectories of the object to be tracked (dashed) and trajectories 
generated by the different algorithms (continuous). All algorithms track the 
optimum effectively, yet AGT and ANT track x*{t) closer than RG. 

evident in Figure 1. In particular, the running method achieves 
the largest worst case error bound, followed in descending 
order by AGT, GTT with increasing r, and lastly NTT (or 
equivalently the hybrid strategy), which achieves the minimal 
worst-case error bound. Notice also the dashed lines displaying 
the theoretical performance of 0{h), 0(h?), and 0(h^), which 
are attained in this simulation. 

We continue the simulation example by changing the pa¬ 
rameters K and /i in (35) to the values k = .1 and ^ = 0.5. 
This brings L = 1.0063, and a condition number Lim close 
to 1. In this settings a first-order method, such as the gradient, 
is expected to perform better than in the case of high condition 
numbers (as in the previous example). We pick the stepsize 
7 = 1 < 2/L. In Figures 3 and 4, we appreciate how the 
relative performances of GTT and NTT change with the new 
parameters^. 

B. Target Tracking Experiments 

The second numerical example consists of a more realistic 
application scenario. We consider an autonomous system (i.e., 
a mobile robot) which is charged with the task of following an 
object whose position is varying continuously in time. Denote 
the reference trajectory of this object as a curve y(t), i.e. a 
function y : K_|_ ^ K" and x e K" be the decision variable 
of the robot, in terms of the waypoint it aims to reach next. 
We aim to solve tracking problems of the form 

min f{x]t) := ^ (||a; - y{t)f + exp(/r 2 ||a: - blD) , 

(38) 

which corresponds to tracking the reference path y{t) while 
remaining close enough to a base station located in b, which 
may correspond to a recharging station or a domain constraint 
associated with maintaining viable communications. Using the 
methods developed in Section II for problems of this type 

^The code of the simulation example will be made available for the readers, 
to appreciate how different stepsizes may influence the asymptotical bounds. 


Fig. 6. Error [m] with respect to the sampling time tj. for /r = 1 [s] for 
different algorithms applied to the tracking problem (38). 

correspond to deriving derivative-based control laws for fully 
actuated systems with simple integrator dynamics. 

For the example considered here, we consider a planar 
example (n = 2) and fix yi = 1000 m^, p ,2 = .005 m~^ 
with the base located at 6 = [100; 100] m. In addition, we 
suppose the target trajectory y{t) follows the specified path 

y{t) = 100[cos(a;f),sin(3a;f)] m 

where uj = 0.01 Hz. Moreover, the position domain is given as 
X = [—150,150] X [—150,150] m^ and we know that x*{t) e 
X. We can compute the constants of Assumptions 1 and 2 
over X c K" [Cfr. Remark 1] m = 1.01, L = 3.45,(170 = 
3.16 [m/s], Cl = 0.06 [m-%C 2 = 0,(^3 = 0.10 [m/s^]. We 
select stepsize 7 = 0.05 < 2/L. With these parameters and 
h = Is, the target moves with maximum speed of 3.16 m/s. 
This is comparable with the speed of current quad-rotors (max 
speed ~10 m/s). 

In any practical setting, the actuation capability of an 
autonomous system is limited either in terms of velocity 
or degrees of freedom. We consider the case where the 
autonomous system may move with the same number of 
degrees as its decision variable dimension, i.e. it may move 
in any direction, yet its maximum velocity is limited to some 
value Umax- A typical velocity maximum for ground vehicles 
is r'max = 4 m/s, which is the choice made in the numerical 
experiments here. Thus, we modify our algorithms to account 
for this constraint by rescaling the prediction-correction step 
to the allowable velocity limit. Of course more complicated 
actuation models may be considered, but these are beyond the 
scope of this work. 

We show the result of this experiment in terms of the actual 
reference path and trajectories generated by the approximate 
algorithms AGT and ANT in Figure 5 over a truncated time 
interval 0 < f < 300 s. The reference trajectory y(t) is the 
dotted line, and the optimal continuous-time trajectory x*(t) 
associated with solving (38) is in blue. By running gradient we 
mean a method which has no prediction step, and operates only 
by correction. Observe that the trajectories generated running 
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TABLE I 

Number of correction steps to keep the same computational 

TIME 



Fig. 7. Worst case eiTor floor with respect to the sampling time interval h 
for different algorithms applied to the tracking problem (38). 


gradient (RG), AGT, and ANT successfully track the optimal 
trajectory and consequently the reference path y{t) up 

to a small error. 

This trend may be more easily observed in Figure 6 which 
shows the magnitude of the difference between the generated 
path and the optimal path \x*(tk) — Xk\, or the tracking 
error, as compared with the sampling time tk- Note that the 
asymptotical bounds computed via Theorems 1 and 3 are less 
meaningful here since the velocity of the robot is scaled. The 
approximate steady state errors achieved by RG, AGT, and 
ANT are respectively 10, 10“^, and 10“®. AGT experiences 
comparable levels of error across different values of r, the 
number of correction steps, and ANT far outperforms the 
other methods. This pattern is corroborated in Figure 7, which 
plots the worst-case error \\x*{tk) — Xk\\ versus the 

sampling interval size h for k = 8 x 10^. In particular, we 
observe that RG experiences an error comparable to 0{h), 
as it theoretically guarantees, whereas our proposed methods 
AGT and ANT achieve a worst-case error of approximately 
0{h?) and 0{h‘^), respectively. Observe that as the problem 
(38) is sampled less often, i.e. when h increases, the optimality 
gap increases. 

Computational Considerations. We empirically observe 
ANT to far outperform the other methods; however, this 
performance gap ignores the increased computational cost 
associated with Newton steps. To obtain a more fair com¬ 
parison, we consider how the different algorithms perform 
when the computational time per correction and prediction 
steps are fixed. Theoretically, each prediction step and New¬ 
ton step require 0 {n^) computations (because of the matrix 
inversion), while the gradient step only 0{n). Practically, in 
this simulation setting, the most demanding task is however the 
evaluation of the gradient and the Hessian, while the actual 
prediction or correction step is less critical (less than 1/10 
time). In particular, evaluating the Hessian requires twice the 
computational effort of evaluating the gradient, so a Newton 
step is three times slower than a gradient step. 

The workflow for each optimization iteration is the follow- 


Sampling period h [s] 

1/10 
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Fig. 8. Worst case En'or [m] w.r.t. h [s] with fixed computational complexity. 


ing; 

tk) A new function is acquired; 

1) A new way point Xk is generated via a correction 
step; 

2) The way point is implemented and the robot moves; 

3) Either a new prediction Xk+i\k is made, based on 
past information, or the correction is refined by 
more correction steps. 

We see that at step 3 the robot can either implement 
the prediction part of our prediction-correction algorithms, or 
refine the correction to have, perhaps, a better starting point 
when the next function is acquired. We consider here the 
running gradient RG (which we remark is nothing less than 
AGT without prediction), the AGT, the ANT, and a running 
version of the Newton method, which uses only correction 
steps (later indicated as RN). 

We now outline how Table 1 is generated. We set AC = 
h/10 as the allowable computational time for the correction 
step (step 1), and we set the gradient evaluation to require 
1/120 s. As a consequence, for this setting the robot can 
perform only r = 1 gradient correction step for a sampling 
time of h = 0.1 s. With this as our basic unit of measurement, 
we All in Table 1 with how many gradient evaluations r may be 
afforded with increasing the sampling interval h. As previously 
noted, ANT requires three times the computation time of AGT, 
and consequently experiences too much latency to be used 
when h = .1 s. 

We set as Atp = 1/40 s as the allowable computational 
time for step 3, so that we can either run one prediction step, 
3 gradient correction refinement steps, or 1 Newton correction 
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refinement step. 

We run the different algorithms when the computation time 
is fixed (i.e. for h = .1 s, in step 1. r = 1 steps of RG 
and AGT may be afforded, but zero of ANT) and record 
the worst-case error achieved versus h in Figure 8. We run 
RG both with 3 additional gradient refinement steps (3G) 
and with 1 Newton refinement step (IN), while RN is run 
with 1 Newton refinement (IN). Broadly, one may observe 
that if ANT may be afforded (i.e. for large h), it is much 
preferable to AGT regardless of the number of correction 
steps T. However, for small sampling periods h, i.e. when 
one requires very low latencies in the control loop, ANT is 
infeasible. We also observe that prediction is to be preferred 
to additional refinement steps, especially when the sampling 
period is small (i.e., when the time derivative approximation 
makes a significant difference because one does not have 
enough time to perform many correction steps). 

V. Conclusion 

We have designed algorithms to track the solution of time- 
varying unconstrained and strongly convex optimization prob¬ 
lems. These algorithms leverage the knowledge of how the 
cost function changes in time and are based on a predictor- 
corrector scheme. We have also developed approximation 
schemes for when the rate at which the objective varies in 
time is not known. We established that these methods yield 
convergence to a neighborhood of the optimal trajectory, with 
a neighborhood of convergence dependent on the sampling 
period. Moreover, the size of this neighborhood is an order 
of magnitude smaller than state-of-the art running algorithms 
which only perform correction steps. In some cases when the 
problem parameters are appropriately chosen and second-order 
information is incorporated, the neighborhood of the optimal 
trajectory to which the algorithm converges is several orders 
of magnitude smaller than existing approaches. 

Moreover, we conducted a numerical analysis of the pro¬ 
posed methods in a simple setting which empirically supported 
the established error bounds. We also considered the task 
of developing a control strategy for an autonomous system 
to follow an object whose position varies continuously in 
time, showing that the developed tools yield an effective 
strategy. In some cases, the algorithms which achieve higher 
accuracy require too much computational latency to be used in 
a closed loop control setting; however, when this latency may 
be afforded, the second-order methods yield highly accurate 
tools. 

Future research directions encompass the generalization 
of this work to constrained problems, general convex cost 
functions, as well as approximate second-order methods to 
weaken the computational requirements of computing the 
Hessian inverse in the prediction step. 

Appendix A 

Proof of Proposition I 

Let us analyze the forward Euler method applied to the 
vector-valued nonlinear dynamical system 

X = F{x{t),t). 


If we apply the forward Euler method to the relation in (39), 
starting at a certain point x{tk), we obtain 

Xk+i\k = x{tk)+ hF{x{tk),tk). (40) 

On the other hand, we can write by using a Taylor 

expansion as 

d 

x{tk+i) = x{tk) + hF{x{tk),tk) + ——F{x{s),s), (41) 

for a certain time s G \tk, tk+i\- Subtracting x{tk+i) from the 
both sides of the equality in (40) and computing the norm of 
the resulting relation implies that 

}i^ d 

\\xk+i\k-x{tk+i)\\= ——F{x{s),s) . (42) 

By considering the definition of the discretization error vector 
Ak-.= Xk+i\k-x{tk+i), we can write (42) as 

I|4ifc|| = y||^Ws),s)||. (43) 

We proceed to find an upper bound for the right-hand side of 
(43). Observing the continuous dynamical system in (6) we 
know that F{x{t),t) is given by 

Fixit),t) = ~[V^^fix;t)]-^Vta.fix-,t). (44) 

Then, by the chain rule we can write 

^F{x{t),t) = S/tF{x, t) + \\/xF{x, f)] X 

= VtF{x, t) + [V^F{x, f)] F(x(f), t), (45) 

where we have used the relation (39). By using the triangle 
inequality, we can upper bound the norm of the right-hand 
side of (45) as 

^^F{xit),t) ^ \\VtF{x,t)\\ + \\[V^F{x,t)]Fixit),t)\\. 

(46) 

We now upper bound the right-hand side of (46) by an¬ 
alyzing its two components. Eirst, based on the definition 
in (44), the partial derivative w.r.t. time can be written as, 
VtF{x,t) = -Vt [[Vxxf{x]t)]~^Vtxf{x]t)\. By applying 
the chain rule, 

Vt [[V,x/(at; f)]-^Vtx/(®; f)] =[V,x/(a5; f)]~^Vttx/(a:; t) 

- \^xxf{x\ f)]“^Vta,x/(ai; t)Vtxf{x] t). 

(47) 

Compute the norm of both sides of (47). Substitute the norm 
\\'^t[['^a:a:fix;t)]~^Vta:fix;t)] j] by \\VtF{x,t)\\. EuTther, 
apply the triangle inequality to the right-hand side of the 
resulting expression to obtain 

||VtF(£C,f)|| ^ \\[Va;xfix;t)]~^Vta;a:f{x;t)Vtxf{x;t)\\ 

+ ||[V,x/(x;f)]"^Vttx/(at;f)||. (48) 

Observe the fact that Wtxxf{x; t) = Vxtxf{x] t). We use the 
Cauchy-Schwartz inequality and the bounds in Assumptions 1 
and 2 to update the upper bound in (48) as 

IIV 7 JPf .tMI ^ C'oC'2 C3 

m 


(39) 


(49) 
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We can now do the same for the second component of the 
right-hand side of (46), and in particular 

\\Va;F{x,t) F{x{t),t)\\ = ||([Vxx/(tC;f)]~^Va;tx/(tC;f)- 
[V 3 ,x/(®; t)Vtxf{.x] t)) F{x{t),t)\\ 

^ f — . (50) 

\ m ) m 

By combining the relation in (43) and (46) with the upper 
bounds in (49) and (50), the claim in (14) follows. ■ 


We may bound the first and second-order derivative terms in 
(55) by using Assumption 2 regarding the functional smooth¬ 
ness as well as the strong convexity constant m of the Hessian 
in Assumption 1 to write 

j| [V,x/]-^ Vi,/ - [V,,/*]-iVi,/* II ( 56 ) 

< c-o ||[v„/]-i - [v,,/*]-i|| + - II Vi,/ - Vi,/* II 

m 

We now further bound the first term of the right-hand side. To 
do that, we use the non-singularity of the Hessian to write 


Appendix B 
Proof of Theorem 1 

In order to prove Theorem 1, we start by bounding the 
error in the prediction step by the terms that depend on the 
functional smoothness and the discretization error using Taylor 
expansions. Then we bound the tracking error of the gradient 
step using convergence properties of the gradient on strongly 
convex functions. By substituting the error of the correction 
step into the prediction step, we establish the main result. 

First, we establish that discrete-time sampling error bound 
stated in (18) is achieved by the updates (7)-(8). For simplicity, 
we modify the notation to omit the arguments X)^ and tk of 
the function /. In particular, define 

V,,/ . V„/(at/i,, f) , Vi,/ . 5 

(51) 

V,,/* := V,,/(at*(ffc); 4) , Vt,/* := Vt,/(at*(ffc); 4) 

Begin by considering the update in (7), the prediction step, 
evaluated at a generic point X/^ sampled at the current sample 
time tk and with associated optimizer x*{t), which due to 
optimality will have null residual vector r{t) = 0. Thus we 
may write 

J ^k+l\k ~ ^ [V,,/] Vi,/ 

\ x*{tk+i) = a 5 *(ffc)-/i[V,,/*]-iVt,/* 

(52) 

By subtracting the equalities in (52), considering the norm of 
the resulting expression, and applying the triangle inequality 
we obtain 

||a:fc-ri|fc - x*{tk+i)\\ ^ \\xk - x*{tk)\\ (53) 

+ h ||[V„/]-iVi,/ - [V„/*]-iVi,/*|| + ||Zifc||. 

Substituting the discretization error norm ||.^fc|| by its upper 
bound in (14) follows 

ll^/i:-t-l|fc X (ffc+l) II ^ llat/j; X (ffc)! 

^ rcgci ^ 2 C 0 C 2 ^ C 3 

2 [ w? m 

+ h I [V,,/]-iVi,/ - [V,,/*]-iVi,/* I. (54) 

We proceed to find an upper bound for the norm 
||[V„/]“^Vi,/- [V,,/*]'iVi,/*|| in the right-hand side 
of (54). By adding and subtracting the term [V,,/*]^^Vi,/ 
and using triangle inequality we can write 

||[V,,/]-iVi,/ - [v„/*]-Vi,/*|| 

< !|[v,,/]-iVi,/- [v„/*]-iVi,/|| 

+ ||[V,,/*]-iVi,/ - [V„/*]-iVi,/*||. (55) 


||[V,,/]-i-[V„/*]“i|| = 

||[V„/*]-i(V,,/ - V,,/*)[V,,/]-i||, (57) 

which by employing, once again, the strong convexity constant 
m of the Hessian in Assumption 1 we can bound as 

||[V,,/]-i - [V,,/*]-i|| ^ ^||V,,/ - V,,/*||. (58) 

II II 

Substituting the upper bound in (58) for the norm 
||[V,,/]-i - [V„/*]-i|| into (56) yields 

||[V,,/]-iVi,/ - [V,,/*]-iVi,/*|| 

< IIV,,/ - V,,/*|| + - II Vi,/ - Vi,/*II. 

(59) 

We consider the Taylor expansion of the second-order term in 
(59), and apply the Mean Value Theorem with ai as a point 
on the line between Xk and x*{tk) to obtain 

II V,,/ V,,/ II ^ II V XXX f (^7^fc)|| ll^fc X (f/c)! 

^ CilltCfe - a;*(ffc)||. (60) 

Applying the same argument for the mixed second-order term 
implies 

||Vi,/-Vi,/*|| s: ||V,i,/(at;4)|| \\xk - x*itk)\\ 

^ C2\\xk - X* {tk)\\ (61) 


The expressions in (60) and (61) may be substituted together 
into (59) to yield 


[V„/]-iVt,/- [V„/*]-iVi,/*|| ( 62 ) 

^ /CoCi C 2 \ ,1 , ,1 

< —^ H- \\xk - X ( 4 ) . 

V m / 


By substituting the upper bound in (62) into (54) and consid¬ 
ering the definition of a in (15), we obtain that 


\\xk+i\k - x*{tk+i)\\ ^ <7\\xk - at*(4)|| + 

r c^c\ ^ 2C0C2 ^ C3 

2 m 


(63) 


For the correction step [cf. ( 8 )] , we may use the standard 
property of gradient descent for strongly convex functions with 
Lipschitz gradients. In particular, the Euclidean error norm of 
the gradient descent method converges as 

\\xlX\ - x*{tk+i)\\ ^ p||*fe+i - a:*( 4 + 1 )II. (64) 

where p = max{|l — 7 ml, |1 — 7 LI}. To see this, it is sufficient 
to write the gradient step as 


= \\xl^i-'rVxf{xl+i;tk+i)-x*{tk+i)\\. (65) 
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According to the optimality condition we can write 
Wxf{x*{tk+i);tk+i) = 0. Considering this observation and 
the equality in (65) we obtain 

\\xl-^\-x^{tk^i)\\ = \\xl^^ - x^{tk^i) ( 66 ) 

? ^fc+i) ^xf{x ; tfc-i-i)]II. 

Consider now the continuous function g : M" x IR_|_ —> M 
defined as g{x;t) := x — jWxf{x;t). Given the boundedness 
of the Hessian and the strong convexity of f(x; t), the gradient 
of g{x;t) is bounded as [38, page 13] 

l|Vxfl(a;;f)|| < max{|l - 7 to|, |1 - 7 LI} = p, (67) 

for all X G M". The bound (67) implies that g{x;t) is 
Lipschitz, therefore we can upper bound ( 66 ) as 

= \\9{xl+i]tk+i) - g{x*{tk+i);tk+i)\\ 
<p||*Ui-a:*(4+i)||- ( 68 ) 

Notice that the relation ( 68 ) is equivalent to the claim in (64). 

Observe that the sequence xl_^_-^ is initialized by the pre¬ 
dicted variable and the corrected variable x^+i is equal 

to Afc+i- Considering these observations and the relation in 
(64) between two consecutive iterates of the sequence 
we can write 

\\xk+i - a:*(4+i)|| ^ p^\\xk+i\k - x*{tk+i)\\. (69) 

We are ready to consider the combined error bound achieved 
by the prediction-correction scheme. By plugging the correc¬ 
tion error of (69) into the prediction error of (63) we obtain 

llaifc+i - x*{tk+i)\\ < p"'(j\\xk - x*{tk)\\ + p^r, (70) 

where F := (/i^/2)[C'qCi/to^ -I- -f Ca/m] is 

defined to simplify the notation. Notice that the relation 
between latfc+i — tc*(ffc+i)|| and \xk — x*{tk)\\ in (70) also 
holds true for \\xk — a:*(ffe)|| and ||a;fc_i — x*{tk-i)\\, i.e., 

\\xk - x*{tk)\\ < p^cr||atfc_i - x*{tk-i)\\ + p^F. (71) 

Substituting the upper bound in (71) for \\xk — a;*(ffc)|| into 
(70) implies an upper bound for ||a:fc+i — x*{tk+i)\\ in terms 
of the norm difference for time fc — 1 as 

||a:fc-ri-a:*(4-ri)|| < {p^a^Wxk-i-x*{tk-i)\\+p"'F{p^a+1). 

(72) 

Now recursively apply the relationship (70) backwards in time 
to the initial time sample and use the same argument form (70) 
to (72) to write 

k 

||a:fc+i-a:*( 4 +i)|| ^ {p^a)'^~^^\\xo-x*{to)\\+p"'F ^(p^cr)*. 

i=0 

(73) 

Substituting k + 1 hy k and simplifying the sum in (73) 
(remembering that p'^a < 1 ) leads to 

\\xk-x* (4) II < II- at* (4)1 + P^F ^ . 

1 — p^a 

(74) 

Considering the result in (74) and the definition for the 
constant F, the result in (18) follows. 

To establish the result stated in (16), observe that in the 
worst case, we may upper bound the term || — 


[Vxx/*] ^Via;/*II in (53) by using the bounds in Assump¬ 
tion 2 to obtain the right-hand side of the following expression 

2C 

l|[Va,x/]-'vta=/ - [y.^rr"yt^r\\ ^ . (75) 

II II ^ 

Substituting the bound in (75) into (54) yields 

2C 

\\xk+i\k - x*{tk+i)\\ < llatfc - a;*( 4 )|| H- h —^ 

^ rcgci ^ 2 C 0 C 2 ^ 

2 [ rrF m? m 

(76) 

To simplify the notation we define a new constant 
F 2 := 2hCo/m and we use again the definition F := 
(/i^/2)[C'gC'i/m^ 4- ^C^C^Irn?' -f Cg/m]. Considering this 
definition and observing the relation in (69) we can write 

||atfc+i - at*(4+i)|| ^ p'"\xk - a:*(4)|| + p'" + 7"). (77) 

Now recursively apply the relationship (77) backwards in time 
to the initial time sample and use the same argument from (70) 
to (74) to write 

||£Cfe+i - a;*(4+i)|| ^ p^('"+^)||a;o - £c*(fo)|| 

+ p"(7^2 + n / , . (78) 

L 1 - p J 

Note that relation (78) shows an upper bound for ||xfe+i — 
at*( 4 -i-i)|| in terms of the initial error ||a;o ~ £c*(fo)|| and an 
extra error term for the bound of convergence. If we substitute 
fc -f 1 by /c in (78) and recall the definition of F^ and F, then 
the result in (16) follows. 

For completeness, we show that p < 1 requires the stepsize 
to be selected as 7 < 2 /T, which therefore enforce a finite 
right-hand side in (78). Starting by the definition of p, we 
require 

p := max{|l — 7 ml, |1 — 7 LI} < 1. (79) 

Solving this equation for 7 and recalling that m sS T by 
Assumptions 1 and 2, the condition 7 < 2/L follows. ■ 

Appendix C 
Proof of Theorem 2 

We consider once again the proof of Theorem 1, in partic¬ 
ular Eq. (63) for fc = 0, due to the prediction step. For the 
correction step, if we applied one time the Newton method, 
we would have 

||a;i - at*(fi)|| ^ ^ll®i|o “ x*{ti)f. (80) 

We proceed to check the validity of (80). To do so, we first 
simplify the notations as 

Vxx/l = Vxx/(a5i|o! f 1 ) ; ^x/l = ^x/(®l| 0 i ^ 1 ) J 

= Vx^f{x*ih);ti), Vxft = Vxf{x*{ti);ti). 

(81) 

Considering the update of the Newton method which is used 
in the correction step of NTT we can write 

||a;i - at*(4)|| = ||aii|o - V,,/f ^V^/i - a;*(<i)||,. (82) 

By factoring the Hessian inverse V^x/r^ using the fact 
that the norm of a product is smaller than the product of the 
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norms, we can show that the right-hand side of (82) is bounded 
above as 


||v,,/f i||||V,,/i(iEi|0 - x*{h)) - V,/i||. 

(83) 

Notice that the norm iV^x/r^l is bounded above by 1/m 
according to the strong convexity assumption. Further, the 
optimality conditions imply V^/f = 0. These observations 
imply that we can rewrite (83) as 

lla^ilo - - a^*(^i)ll 

< -||V,,/i(aii|o - x*{h)) - (V,/i - V,/f) II. 

(84) 

Define ri = Xi\q — x*{ti) and |(r) = x*{ti) + r(a:i|o — 
x*{ti)). We now use the fundamental theorem of calculus 
and the Lipschitz continuity of the Hessian (Assumption 2) to 
upper bound the rightmost term of (84) as 


||V,,/iri - (V,/i - V,/f) II 

= ||v^a;/iri - r Va;x/(4(T);fi)ridr|| 

II Jo 11 

= ri\ Vxx/i - Va;x/(€(T);fi)dT 
JO 

< kill f llVa^x/i - Vxx/(^(T);fi)||dr 

Jo 

< C'lkip J (1 - kdi- = ^kip- (85) 

Notice that the first inequality in (85) is implied by the 
Cauchy-Schwarz inequality and the second inequality is true 
because of the Lipschitz continuity of the gradients with con¬ 
stant Cl- By plugging the bound (85) into (84) and recalling 
the definition ri = at^io — we obtain that 

fa^ilo - - a5*(fi)|| < 

( 86 ) 


Combining the inequalities in (82) and ( 86 ) follows the claim 
in (80). 

Now consider the case that t steps of the Newton method 
are applied in the correction step of the NTT algorithm. Then, 
the the error ||a;i ~ tc*(fi)|| at step ti is bounded above as 

Iki - a:*(fi)|| ^ \\xiio - X* . (87) 


Notice that the upper bound for the prediction error in (63) 
implies that the norm Ikijo — a;*(fi)|| is bounded above as 


IIaiilo-a:* (fi) || ^ cr|| Xq-x* (to) ||-l-y 


^ 2 C 0 C 2 ^ C 3 


m ^ 

( 88 ) 


where a := 1 + hSi, and (5i is defined in (21). Combining the 
inequalities in (87) and ( 88 ) and considering the definitions 
Q := 2mlCi and S 2 '■= -I- + C^l2m 

yield 


Iki - tc*(fi)|| ^ (5 {a\\xo ~ x*{to)\\ + h‘^S 2 Y''■ 

(89) 


Based on the assumption in (23) the initial error is bounded 
above by ch^ (with c an arbitrary positive constant). Substitut¬ 
ing this upper bound into the right-hand side of (89) follows 

Iki - a:*(fi)|| ^ ((crc -I- S 2 )h'^)‘^''. (90) 

Notice that the inequality in (90) shows that the error ||tct — 
ai*(fi)|| for the step t = 1 is in the order of 
which is a better error bound with respect to the initial 
error ||aio — a:*(f:o)|| = 0{h?). We now proceed to find 
under which conditions the error in inequality (90) is valid 
for all Ikfc — x*{tk)\\ with fc > 1. To do so, we use 
induction. We first establish the sufficient conditions for which 
Iki — a;*(fi)|| ^ c/i^; then we substitute ||a :2 — x*(t 2 )\\ with 
Iki — a;*(fi)|| and ||a:i — £c* (fi) || with ||£Co — £c* (fg) || in (89) 
and by induction on the error term \\xk — a;*(ffc)|| we will 
prove the claim that ||£Cfc — x*{tk)\\ = Oih^) with /c > 1. In 
particular, we need to make sure that the sampling period h is 
chosen such that the upper bound in (90) is smaller than ch'^, 
i.e.. 


Observe that according to the required condition for the 
sampling period h in (22) we can write h ^ 1. Therefore, the 
constant a := l + h6i is bounded above by l-fJi. Substituting 
1 -I- for a in (91) implies a sufficient condition for (91) as 

g-(2r-l) ^ ^y2^2r ^ (92) 


We emphasize that if the inequality in (92) holds true then 
the statement in (91) is satisfied. Regrouping the terms in (92) 
leads to the following condition for the sampling interval h as 


h ^ 




((1 + <)i)c+<) 2 ) 2 " 


(93) 


Therefore, if (93) is satisfied then (92) and subsequently (91) 
are satisfied. Based on the assumption in (22), we know 
that (93) is valid and the condition in (91) is satisfied. This 
observation in conjunction with the inequality in (90) implies 
that 

Iki — at* (L) II ^ c/i^. (94) 


By starting again from (89), and by substituting ||at 2 —a;*(f 2 )|| 
with Iki — a;*(fi)|| and ||a:i — at*(fi)|| with ||ato — a;*(fo)||, 
we arrive at the inequality 

||X 2 - x*{t 2 )\\ ^ {{ac + S 2 )h^f" . (95) 

Since the condition in (93) does not depend on the optimality 
gap, they yield ||a :2 — at*(f 2 )|| < ch^. By applying the 
induction argument, we can now show that 

||atfe - a:*( 4 )|| ^ [(ac + d 2 )h'^)^^ , (96) 

for all fc ^ 1, which is (24). ■ 


Appendix D 
Proof of Theorem 3 

We prove Theorem 3 by evaluating the extra error term 
coming from the approximate time derivative in (10). In 
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particular, consider the Taylor’s expansion of the gradient Appendix E 

xf{xk]tk-i) near the point {xk,tk) which is given by PROOF OF THEOREM 4 


'^xf{xk;tk-i) = '^xf{xk;tk) - hS/txf{xk;tk)+ 

hy2\/ttxf{xk;s). (97) 

for a particular sG [tk-i,tk]- Regrouping the terms in (97) 
it follows that the partial mixed gradient Vxf{xk]tk) can be 
written as 

^ f, . . xf{Xk]tk)-'^xf{Xk]tk-l) , 
ytxj{xk]tk) = --h 

h/2\/ttxf{xk]s). (98) 

Considering the definition of the approximate partial mixed 
gradient V txfixk] tk) in (10) and the expression for the exact 
mixed gradient Vxf{xk]tk) in (98), we obtain that 

^ tx f (^X k ^ tk') ^ txf (^Xk^tk) ttxf {Xki *5) • (99) 

Based on Assumption 2 the norm '^ttxf{xk;s) is bounded 
above by C 3 . Therefore, the error of the partial mixed gradient 
approximation is upper bounded by 

hC 

\\^txf(.Xk]tk)-Vtxf{.Xk-,tk)\\ ^ (100) 

Consider the approximate prediction step of the AGT algo¬ 
rithm in (11). By adding and subtracting the exact prediction 
direction h\Vxxf{xk]tk))~'^'^txfk to the right-hand side of 
the update in (11) we obtain 

Xk+l\k Xk h [V XX f {Xk^tk)) ^ txf {Xk^tk)~\^ (101) 

+ h[V 

XX f{xk]tk)] tx f {p^k 1 ^ tx f {Xk 5 • 

Subtracting x*{tk+i) = x*{tk) - h[S/xxf*]~^'^txf* + 
in (52) from (101), and applying the triangle inequality lead 
to 


ll®fe-l-l|fc X ^ \\xk X (ffc)ll 


( 102 ) 


[Vxxfrvtxf - [v,xr]-'v*,r I + izifei 

[V XX f{xk;tk)] Yv tx f ^ tx f {Xk] tk) 


+ h 
+ h 


Observe the upper bound for the 
Ak\\ in (14). Further, observe 

[V XX fixk;tk)] tx f{Xk\tk) - ^ tx f(^Xk] tk)^ 

bounded above by C 3 hj 2 m according to (100) and 
Assumption 2. Substituting these upper bounds into (102) 
yields 


norm 

that 


\\Xk + l\k-X*{tk + l)\\ ^ \\Xk-X*{tk)\\ 

-<2 
'0 


+y 


Cl ^ 2 C 0 C 2 ^ 2 C 3 
'w? m 


+ h I [Vxxfr^^txf - [yxxfr^vtxf* II. ( 103 ) 


Observe that the inequality for the AGT algorithm in (103) is 
identical to the result for the GTT method in (54) except for 
the multiplier of h^. This observation implies that by following 
the same steps from (55) to (74) we can prove the claim in 
(28). Likewise, if we redo the steps from (75) to (78), the 
claim in (30) can be followed from the result in (103). ■ 


The proof of Theorem 4 is based on the proof of Theorems 2 
and 3. Since the correction step of NTT and ANT are identical, 
we can redo the steps from (80) to (87) to show that 

||£Ci - a;*(fi)|| ^ - x*{ti)f'", (104) 


where Q = 2mjCi. The prediction step of AGT and ANT are 
identical, therefore the result in (103) also holds true for ANT. 
Consider the result in (103) for k = 0. Using the inequality 
in (62) we can simplify the right-hand side of (103) as 


U|o 


-a:* (G) ly (t|| ajo-a:* (fo) II+- 




CgCi ^ 2 C 0 C 2 ^ 2 C 3 
m? m 

(105) 


where cr = 1 4- /i(CoCi/m^ -I- C 2 /m). Combining the inequal¬ 
ities in (104) and (105) and considering the definition of S 2 
in (31) lead to 


||ati - £c*(G)|y Q (cr||£Co - a:*(fo)|| +• 

(106) 

The result for ANT in (106) is similar to the result for NTT 
in (89). By following the steps from (90) to (96) the result in 
(34) follows. ■ 
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